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Foreword 



This volume contains one invited and fifteen submitted papers presented at the 
Tenth International Conference on Inductive Logic Programming (ILP 2000) . 
The fifteen accepted papers were selected by the program committee from the 
37 papers submitted to the conference. Each paper was carefully reviewed by 
three referees. 

ILP 2000 was held at Imperial College, London, 24-27 July 2000 and was inte- 
grated with the First International Conference on Computational Logic 
(CL 2000). With ILP’s strong roots in computational logic, this was a natu- 
ral marriage. CL 2000 was a five-day extravaganza, incorporating both the Sixth 
International Conference on Rules and Objects in Databases (DOOD 2000) and 
the Tenth International Workshop on Logic-based Program Synthesis and Trans- 
formation (LOPSTR 2000) and featuring eight invited speakers, twelve tutorials, 
and seven affiliated workshops. Registrants for CL 2000 and ILP 2000 could move 
freely between the two events, the main distinction between the events being sep- 
arate conference proceedings. 

We wish to thank all the authors who submitted their papers to ILP 2000; 
the program committee members, and other reviewers who did a thorough job in 
spite of demanding deadlines; and our invited speaker, David Page. Thanks also 
to Alfred Hofmann, and everyone else at Springer for their smooth handling of 
these proceedings. We would also like to thank the organisers of CL 2000, whose 
cooperation brought the two events together: John Lloyd (Program Chair), 
Marek Sergot (Conference Chair), Frank Kriwaczek and Francesca Toni (Lo- 
cal Organisers), Femke van Raamsdonk (Publicity Chair) and Sandro Etalle 
(Workshop Chair). Finally, we are grateful to our sponsors for their financial 
support. 
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Abstract. Inductive logic programming (ILP) is built on a foundation 
laid by research in other areas of computational logic. But in spite of this 
strong foundation, at 10 years of age ILP now faces a number of new chal- 
lenges brought on by exciting application opportunities. The purpose of 
this paper is to interest researchers from other areas of computational 
logic in contributing their special skill sets to help ILP meet these chal- 
lenges. The paper presents five future research directions for ILP and 
points to initial approaches or results where they exist. It is hoped that 
the paper will motivate researchers from throughout computational logic 
to invest some time into “doing” ILP. 



1 Introduction 

Inductive Logic Programming has its foundations in computational logic, includ- 
ing logic programming, knowledge representation and reasoning, and automated 
theorem proving. These foundations go well beyond the obvious basis in definite 
clause logic and SLD-resolution. In addition ILP has heavily utilized such the- 
oretical results from computational logic as Lee’s Subsumption Theorem El, 
Gottlob’s Lemma linking implication and subsumption Marcinkowski and 
Pacholski’s result on the undecidability of implication between definite clauses 
El, and many others. In addition to utilizing such theoretical results, ILP de- 
pends crucially on important advances in logic programming implementations. 
For example, many of the applications summarized in the next brief section 
were possible only because of fast deductive inference based on indexing, par- 
tial compilation, etc. as embodied in the best current Prolog implementations. 
Furthermore, research in computational logic has yielded numerous important 
lessons about the art of knowledge representation in logic that have formed the 
basis for applications. Just as one example, definite clause grammars are cen- 
tral to several ILP applications within both natural language processing and 
bioinformatics . 

ILP researchers fully appreciate the debt we owe to the rest of computational 
logic, and we are grateful for the foundation that computational logic has pro- 
vided. Nevertheless, the goal of this paper is not merely to express gratitude, but 
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also to point to the present and future needs of ILP research. More specifically, 
the goal is to lay out future directions for ILP research and to attract researchers 
from the various other areas of computational logic to contribute their unique 
skill sets to some of the challenges that ILP now facesfl In order to discuss these 
new challenges, it is necessary to first briefly survey some of the most challenging 
application domains of the future. Section 2 provides such a review. Based on 
this review, Section 3 details five important research directions and concomitant 
challenges for ILP, and Section 4 tries to “close the sale” in terms of attracting 
new researchers. 



2 A Brief Review of Some Application Areas 

One of the most important application domains for machine learning in general 
is bioinformatics, broadly interpreted. This domain is particularly attractive for 
(1) its obvious importance to society, and (2) the plethora of large and growing 
data sets. Data sets obviously include the newly completed and available DNA se- 
quences for C. elegans (nematode) , Drosophila (fruitfly) , and (depending on one’s 
definitions of “completed” and “available”) man. But other data sets include 
gene expression data (recording the degree to which various genes are expressed 
as protein in a tissue sample), bio-activity data on potential drug molecules, 
x-ray crystallography and NMR data on protein structure, and many others. 
Bioinformatics has been a particularly strong application area for ILP, dating 
back to the start of Stephen Muggleton’s collaborations with Mike Sternberg 
and Ross King 122 Hi). Application areas include protein structure prediction 
PH (32| , mutagenicity prediction jO] , and pharmacophore discovery |Zj (discov- 
ery of a 3D substructure responsible for drug activity that can be used to guide 
the search for new drugs with similar activity). ILP is particularly well-suited 
for bioinformatics tasks because of its abilities to take into account background 
knowledge and structured data and to produce human-comprehensible results. 
For example, the following is a potential pharmacophore for ACE inhibition (a 
form of hypertension medication) , where the spacial relationships are described 
through pairwise distances 0 

Molecule A is an ACE inhibitor if : 

molecule A contains a zinc binding site B, and 

molecule A contains a hydrogen acceptor C, and 

the distance between B and C is 7.9 +/- .75 Angstroms, and 

molecule A contains a hydrogen acceptor D, and 

the distance between B and D is 8.5 +/- .75 Angstroms, and 

the distance between C and D is 2.1 +/- .75 Angstroms, and 

molecule A contains a hydrogen acceptor E, and 

the distance between B and E is 4.9 +/- .75 Angstroms, and 

1 Not to put too fine a point on the matter, this paper contains unapologetic prosely- 
tizing. 

2 Hydrogen acceptors are atoms with a weak negative charge. Ordinarily, zinc-binding 
would be irrelevant; it is relevant here because ACE is one of several proteins in 
the body that typically contains an associated zinc ion. This is an automatically 
generated translation of an ILP-generated clause. 
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Fig. 1 . ACE inhibitor number 1 with highlighted 4-point pharmacophore. 



the distance between C and E is 3.1 +/- .75 Angstroms, and 
the distance between D and E is 3.8 +/- .75 Angstroms. 

Figures 1 and 2 show two different ACE inhibitors with the parts of pharma- 
cophore highlighted and labeled. 

A very different type of domain for machine learning is natural language pro- 
cessing (NLP). This domain also includes a wide variety of tasks such as part- 
of-speech tagging, grammar learning, information retrieval, and information ex- 
traction. Arguably, natural language translation (at least, very rough-cut trans- 
lation) is now a reality — witness for example the widespread use of Altavista’s 
Babelfish. Machine learning techniques are aiding in the construction of informa- 
tion extraction engines that fill database entries from document abstracts (e.g., 
j3]) and from web pages (e.g., WhizBang! Labs, http://www.whizbanglabs.com). 
NLP became a major application focus for ILP in particular with the ESPRIT 
project ILP 2 . Indeed, as early as 1998 the majority of the application papers at 
the ILP conference were on NLP tasks. 

A third popular and challenging application area for machine learning is 
knowledge discovery from large databases with rich data formats, which might 
contain for example satellite images, audio recordings, movie files, etc. While 
Dzeroski has shown how ILP applies very naturally to knowledge discovery from 
ordinary relational databases [SJ, advances are needed to deal with multimedia 
databases. 

ILP has advantages over other machine learning techniques for all of the 
preceding application areas. Nevertheless, these and other potential applications 
also highlight the following shortcomings of present ILP technology. 
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Fig. 2. ACE inhibitor number 2 with highlighted 4-point pharmacophore. 

— Other techniques such as hidden Markov models, Bayes Nets and Dynamic 
Bayes Nets, and bigrams and trigrams can expressly represent the probabil- 
ities inherent in tasks such as part-of-speech tagging, alignment of proteins, 
robot maneuvering, etc. Few ILP systems are capable of representing or 
processing probabilities H 

— ILP systems have higher time and space requirements than other machine 
learning systems, making it difficult to apply them to large data sets. Alter- 
native approaches such as stochastic search and parallel processing need to 
be explored. 

— ILP works well when data and background knowledge are cleanly expressible 
in first-order logic. But what can be done when databases contain images, 
audio, movies, etc.? ILP needs to learn lessons from constraint logic program- 
ming regarding the incorporation of special-purpose techniques for handling 
special data formats. 

— In scientific knowledge discovery, for example in the domain of bioinformat- 
ics, it would be beneficial if ILP systems could collaborate with scientists 
rather than merely running in batch mode. If ILP does not take this step, 
other forms of collaborative scientific assistants will be developed, supplant- 
ing ILP’s position within these domains. 

3 It should be noted that Stephen Muggleton and James Cussens have been pushing 
for more attention to probabilities in ILP. Stephen Muggleton initiated this direction 
with an invited talk at ILP’95 and James Cussens has a recently-awarded British 
EPSRC project along these lines. Nevertheless, litte attention has been paid to this 
shortcoming by other ILP researchers, myself included. 
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In light of application domains and the issues they raise, the remainder of 
this paper discusses five directions for future research in ILP. Many of these 
directions require fresh insights from other areas of computational logic. The 
author’s hope is that this discussion will prompt researchers from other areas to 
begin to explore ILP0 

3 Five Directions for ILP Research 

Undoubtedly there are more than five important directions for ILP research. But 
five directions stand out clearly at this point in time. They stand out not only 
in the application areas just mentioned, but also when examining current trends 
in AI research generally. These areas are 

— incorporating explicit probabilities into ILP 

— stochastic search 

— building special-purpose reasoners into ILP 

— enhancing human-computer interaction to make ILP systems true collabora- 
tors with human experts 

— parallel execution using commodity components 

Each of these research directions can contribute substantially to the future 
widespread success of ILP. And each of these directions could benefit greatly 
from the expertise of researchers from other areas of computational logic. This 
section discusses these five research directions in greater detail. 



3.1 Probabilistic Inference: ILP and Bayes Nets 

Bayesian Networks have largely supplanted traditional rule-based expert sys- 
tems. Why? Because in task after task we (AI practitioners) have realized that 
probabilities are central. For example, in medical diagnosis few universally true 
rules exist and few entirely accurate laboratory experiments are available. In- 
stead, probabilities are needed to model the task’s inherent uncertainty. Bayes 
Nets are designed specifically to model probability distributions and to rea- 
son about these distributions accurately and (in some cases) efficiently. Conse- 
quently, in many tasks including medical diagnosis |E3> Bayes Nets have been 
found to be superior to rule-based systems. Interestingly, inductive inference, or 
machine learning, has turned out to be a very significant component of Bayes Net 
reasoning. Inductive inference from data is particularly important for developing 
or adjusting the conditional probability tables (CPTs) for various network nodes, 
but also is used in some cases even for developing or modifying the structure of 
the network itself. 

4 It is customary in technical papers for the author to refer to himself in the third 
person. But because the present paper is an invited paper expressing the author’s 
opinions, the remainder will be much less clumsy if the author dispenses with that 
practice, which I now will do. 
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But not all is perfection and contentment in the world of Bayes Nets. A 
Bayes Net is less expressive than first-order logic, on a par with propositional 
logic instead. Consequently, while a Bayes Net is a graphical representation, it 
cannot represent relational structures. The only relationships captured by the 
graphs are conditional dependencies among probabilities. This failure to capture 
other relational information is particularly troublesome when using the Bayes 
Net representation in learning. For a concrete illustration, consider the task of 
pharmacophore discovery. It would be desirable to learn probabilistic predic- 
tors, e.g., what is the probability that a given structural change to the molecule 
fluoxetine (Prozac) will yield an equally effective anti-depressant (specifically, 
serotonin reuptake inhibitor)? To build such a probabilistic predictor, we might 
choose to learn a Bayes Net from data on serotonin reuptake inhibitors. Unfor- 
tunately, while a Bayes Net can capture the probabilistic information, it cannot 
capture the structural properties of a molecule that are predictive of biological 
activity. 

The inability of Bayes Nets to capture relational structure is well known 
and has led to attempts to extend the Bayes Net representation 0EJ and to 
study inductive learning with such an extended representation. But the result- 
ing extended representations are complex and yet fall short of the expressivity 
of first-order logic. An interesting alternative for ILP researchers to examine is 
learning clauses with probabilities attached. It will be important in particular 
to examine how such representations and learning algorithms compare with the 
extended Bayes Net representations and learning algorithms. Several candidate 
clausal representations have been proposed and include probabilistic logic pro- 
grams, stochastic logic programs, and probabilistic constraint logic programs; 
Cussens provides a nice survey of these representations 0- Study already has 
begun into algorithms and applications for learning stochastic logic programs 
E2 , and this is an exciting area for further work. In addition, the first-order 
representation closest to Bayes Nets is that of Ngo and Haddawy. The remain- 
der of this subsection points to approaches for, and potential benefits of, learning 
these clauses in particular. 

Clauses in the representation of Ngo and Haddawy may contain random 
variables as well as ordinary logical variables. A clause may contain at most one 
random variable in any one literal, and random variables may appear in body 
literals only if a random variable appears in the head. Finally, such a clause also 
has a Bayes Net fragment attached, which may be thought of as a constraint. This 
fragment has a very specific form. It is a directed graph of node depth two (edge 
depth one), with all the random variables from the clause body as parents of the 
random variable from the clause head! Figure 3 provides an example of such a 
clause as might be learned in pharmacophore discovery (CPT not shown) . This 
clause enables us to specify, through a CPT, how the probability of a molecule 
being active depends on the particular values assigned to the distance variables 

5 This is not exactly the definition provided by Ngo and Haddawy, but it is an equiv- 
alent one. Readers interested in deductive inference with this representation are 
encouraged to see BUBDI. 
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D 1, D2, and D 3. In general, the role of the added constraint in the form of a 
Bayes net fragment is to define a conditional probability distribution over the 
random variable in the head, conditional on the values of the random variables 
in the body. When multiple such clauses are chained together during inference, 
a larger Bayes Net is formed that defines a joint probability distribution over 
the random variables. 



drug(Molecule, Activity_Le vel) : - 

contains_hydrophobe(Molecule, Hydrophobe), 
contains_basic_nitrogen(Molecule, Nitrogen), 
contains_hydrogen_acceptor(Molecule, Acceptor), 
distance(Molecule,Hydrophobe,Nitrogen,D 1 ), 
distance(Molecule,Hydrophobe,Acceptor,D2), 
distance(Molecule, Nitrogen, Acceptor, D3). 

Fig. 3. A clause with a Bayes Net fragment attached (CPT not included) . The 
random variables are Activity-Level, Dl, D2, and D3. Rather than using a hard 
range in which the values of Dl, D2, and D3 must fall, as the pharmacophores 
described earlier, this new representation allows us to describe a probability 
distribution over Activity-Level in terms of the values of Dl, D2, and D3. For 
example, we might assign higher probabilities to high Activity-Level as Dl gets 
closer to 3 Angstroms from either above or below. The CPT itself might be a 
linear regression model, i.e. a linear function of Dl, D2, and D3 with some fixed 
variance assumed, or it might be a discretized model, or other. 




I conjecture that existing ILP algorithms can effectively learn clauses of this 
form with the following modification. For each clause constructed by the ILP 
algorithm, collect the positive examples covered by the clause. Each positive 
example provides a value for the random variable in the head of the clause, 
and because the example is covered, the example together with the background 
knowledge provides values for the random variables in the body. These values, 
over all the covered positive examples, can be used as the data for constructing 
the conditional probability table (CPT) that accompanies the attached Bayes 
Net fragment. When all the random variables are discrete, a simple, standard 
method exists for constructing CPTs from such data and is described nicely in 
m- If some or all of the random variables are continuous, then under certain 
assumptions again simple, standard methods exist. For example, under one set 
of assumptions linear regression can be used, and under another naive Bayes can 
be used. In fact, the work by Srinivasan and Camacho EE! on predicting levels 
of mutagenicity and the work by Craven and colleagues HO! on information 
extraction can be seen as special cases of this proposed approach, employing 
linear regression and naive Bayes, respectively. 

While the approach just outlined appears promising, of course it is not the 
only possible approach and may not turn out to be the best. More generally, 
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ILP and Bayes Net learning are largely orthogonal. The former handles rela- 
tional domains well, while the latter handles probabilities well. And both Bayes 
Nets and ILP have been applied successfully to a variety of tasks. Therefore, 
it is reasonable to hypothesize the existence and utility of a representation and 
learning algorithms that effectively capture the advantages of both Bayes net 
learning and ILP. The space of such representations and algorithms is large, so 
combining Bayes Net learning and ILP is an area of research that is not only 
promising but also wide open for further work. 

3.2 Stochastic Search 

Most ILP algorithms search a lattice of clauses ordered by subsumption. They 
seek a clause that maximizes some function of the size of the clause and coverage 
of the clause, i.e. the numbers of positive and negative examples entailed by the 
clause together with the background theory. Depending upon how they search 
this lattice, these ILP algorithms are classified as either bottom-up (based on 
least general generalization) or top-down (based on refinement). Algorithms are 
further classified by whether they perform a greedy search, beam search, admissi- 
ble search, etc. In almost all existing algorithms these searches are deterministic. 
But for other challenging logic/ AI tasks outside ILP, stochastic searches have 
consistently outperformed deterministic searches. This observation has been re- 
peated for a wide variety of tasks, beginning with the 1992 work of Kautz, 
Selman, Levesque, Mitchell, and others on satisfiability using algorithms such 
as GSAT and WSAT (WalkSAT) |32j G3J- Consequently, a promising research 
direction within ILP is the use of stochastic search rather than deterministic 
search to examine the lattice of clauses. A start has been made in stochastic 
search for ILP and this section describes that work. Nevertheless many issues 
remain unexamined, and I will mention some of the most important of these at 
the end of this section. 

ILP algorithms face not one but two difficult search problems. In addition to 
the search of the lattice of clauses, already described, simply testing the coverage 
of a clause involves repeated searches for proofs — “if I assume this clause is true, 
does a proof exist for that example?” The earliest work on stochastic search in 
ILP (to my knowledge) actually addressed this latter search problem. Sebag and 
Rouveirol [32! employed stochastic matching, or theorem proving, and obtained 
efficiency improvements over Progol in the prediction of mutagenicity, without 
sacrificing predictive accuracy or comprehensibility. More recently, Botta, Gior- 
dana, Saitta, and Sebag have pursued this approach further, continuing to show 
the benefits of replacing deterministic matching with stochastic matching HUE!. 

But at the center of ILP is the search of the clause lattice, and surprisingly 
until now the only stochastic search algorithms that have been tested have been 
genetic algorithms. Within ILP these have not yet been shown to significantly 
outperform deterministic search algorithms. I say it is surprising that only GAs 
have been attempted because for other logical tasks such as satisfiability and 
planning almost every other approach outperforms GAs, including simulated 
annealing, hill-climbing with random restarts and sideways moves (e.g. GSAT), 
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and directed random walks (e.g. WSAT) [SJ. Therefore, a natural direction for 
ILP research is to use these alternative forms of stochastic search to examine 
the lattice of clauses. The remainder of this section discusses some of the issues 
involved in this research direction, based on my initial foray in this direction with 
Ashwin Srinivasan that includes testing variants of GSAT and WSAT tailored 
to ILP. 

The GSAT algorithm was designed for testing the satisfiability of Boolean 
CNF formulas. GSAT randomly draws a truth assignment over the n proposi- 
tional variables in the formula and then repeatedly modifies the current assign- 
ment by flipping a variable. At each step all possible flips are tested, and the flip 
that yields the largest number of satisfied clauses is selected. It may be the case 
that every possible flip yields a score no better (in fact, possibly even worse) 
than the present assignment. In such a case a flip is still chosen and is called 
a “sideways move” (or “downward move” if strictly worse). Such moves turn 
out to be quite important in GSAT’s performance. If GSAT finds an assignment 
that satisfies the CNF formula, it halts and returns the satisfying assignment. 
Otherwise, it continues to flip variables until it reaches some pre-set maximum 
number of flips. It then repeats the process by drawing a new random truth as- 
signment. The overall process is repeated until a satisfying assignment is found 
or a pre-set maximum number of iterations is reached. 

Our ILP variant of this algorithm draws a random clause rather than a 
random truth assignment. Flips involve adding or deleting literals in this clause. 
Applying the GSAT methodology to ILP in this manner raises several important 
points. First, in GSAT scoring a given truth assignment is very fast. In contrast, 
scoring a clause can be much more time consuming because it involves repeated 
theorem proving. Therefore, it might be beneficial to combine the “ILP-GSAT” 
algorithm with the type of stochastic theorem proving mentioned above. Second, 
the number of literals that can be built from a language often is infinite, so 
we cannot test all possible additions of a literal. Our approach has been to 
base any given iteration of the algorithm on a “bottom clause” built from a 
“seed example,” based on the manner in which the ILP system PROGOL m 
constrains its search space. But there might be other alternatives for constraining 
the set of possible literals to be added at any step. Or it might be preferable to 
consider changing literals rather than only adding or deleting them. Hence there 
are many alternative GSAT-like algorithms that might be built and tested. 

Based on our construction of GSAT-like ILP algorithms, one can imagine 
analogous WSAT-like and simulated annealing ILP algorithms. Consider WSAT 
in particular. On every flip, with probability p (user-specified) WSAT makes an 
randomly-selected efficacious flip instead of a GSAT flip. An efficacious flip is a 
flip that satisfies some previously-unsatisfied clause in the CNF formula, even if 
the flip is not the highest-scoring flip as required by GSAT. WSAT outperforms 
GSAT for many satisfiability tasks because the random flips make it less likely to 
get trapped in local optima. It will be interesting to see if the benefit of WSAT 
over GSAT for satisfiability carries over to ILP. The same issues mentioned above 
for ILP- GSAT also apply to ILP-WSAT. 
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It is too early in the work to present concrete conclusions regarding stochastic 
ILP. Rather the goal of this section has been to point to a promising direction 
and discuss the space of design alternatives to be explored. Researchers with 
experience in stochastic search for constraint satisfaction and other logic/ AI 
search tasks will almost certainly have additional insights that will be vital to 
the exploration of stochastic search for ILP. 

3.3 Special-Purpose Reasoning Mechanisms 

One of the well-known success stories of computational logic is constraint logic 
programming. And one of the reasons for this success is the ability to integrate 
logic and special purpose reasoners or constraint solvers. Many ILP applications 
could benefit from the incorporation of special-purpose reasoning mechanisms. 
Indeed, the approach advocated in Section 3.1 to incorporating probabilities 
in ILP can be thought of as invoking special purpose reasoners to construct 
constraints in the form of Bayes Net fragments. The work by Srinivasan and 
Camacho mentioned there uses linear regression to construct a constraint, while 
the work by Craven and Slattery uses naive Bayes techniques to construct a 
constraint. The point that is crucial to notice is that ILP requires a “constraint 
constructor,” such as linear regression, in addition to the constraint solver re- 
quired during deduction. Let’s now turn to consideration of tasks where other 
types of constraint generators might be useful. 

Consider the general area of knowledge discovery from databases. Suppose 
we take the standard logical interpretation of a database, where each relation is 
a predicate, and each tuple in the relation is a ground atomic formula built from 
that predicate. Dzeroski and Lavrac show how ordinary ILP techniques are very 
naturally suited to this task, if we have an “ordinary” relational database. But 
now suppose the database contains some form of complex objects, such as im- 
ages. Simple logical similarities may not capture the important common features 
across a set of images. Instead, special-purpose image processing techniques may 
be required, such as those described by Leung and colleagues ESI US- In addi- 
tion to simple images, special-purpose constraint constructors might be required 
when applying ILP to movie (e.g. MPEG) or audio (e.g. MIDI) data, or other 
data forms that are becoming ever more commonplace with the growth of mul- 
timedia. For example, a fan of the Bach, Mozart, Brian Wilson, and Elton John 
would love to be able to enter her/his favorite pieces, have ILP with a constraint 
generator build rules to describe these favorites, and have the rules suggest other 
pieces or composers s/he should access. As multimedia data becomes more com- 
monplace, ILP can remain applicable only if it is able to incorporate special- 
purpose constraint generators. 

Alan Frisch and I have shown that the ordinary subsumption ordering over 
formulas scales up quite naturally to incorporate constraints m- Nevertheless, 
that work does not address some of the hardest issues, such as how to ensure the 
efficiency of inductive learning systems based on this ordering and how to design 
the right types of constraint generators. These questions require much further 
research involving real-world applications such as multimedia databases. 
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One final point about special purpose reasoners in ILP is worth making. 
Constructing a constraint may be thought of as inventing a predicate. Predicate 
invention within ILP has a long history 121 01 ESI G2! General techniques 
for predicate invention encounter the problem that the space of “inventable” 
predicates is unconstrained, and hence allowing predicate invention is roughly 
equivalent to removing all bias from inductive learning. While removing bias 
may sound at first to be a good idea, inductive learning in fact requires bias 
El 121- Special purpose techniques for constraint construction appear to make 
it possible to perform predicate invention in way that is limited enough to be 
effective El El 

3.4 Interaction with Human Experts 

To discover new knowledge from data in fields such as telecommunications, 
molecular biology, or pharmaceuticals, it would be beneficial if a machine learn- 
ing system and a human expert could act as a team, taking advantage of the 
computer’s speed and the expert’s knowledge and skills. ILP systems have three 
properties that make them natural candidates for collaborators with humans in 
knowledge discovery: 

Declarative Background Knowledge ILP systems can make use of declara- 
tive background knowledge about a domain in order to construct hypotheses. 
Thus a collaboration can begin with a domain expert providing the learning 
system with general knowledge that might be useful in the construction of 
hypotheses. Most ILP systems also permit the expert to define the hypothe- 
sis space using additional background knowledge, in the form of a declarative 
bias. 

Natural descriptions of structured examples Feature-based learning sys- 
tems require the user to begin by creating features to describe the examples. 
Because many knowledge discovery tasks involve complex structured exam- 
ples, such as molecules, users are forced to choose only composite features 
such as molecular weight — thereby losing information — or to invest substan- 
tial effort in building features that can capture structure (see |ES| for a 
discussion in the context of molecules). ILP systems allow a structured ex- 
ample to be described naturally in terms of the objects that compose it, 
together with relations between those objects. The 2-dimensional structure 
of a molecule can be represented directly using its atoms as the objects and 
bonds as the relations; 3-dimensional structure can be captured by adding 
distance relations. 

Human-Comprehensible Output ILP systems share with propositional-logic 
learners the ability to present a user with declarative, comprehensible rules 
as output. Some ILP systems can return rules in English along with visual 
aids. For example, the pharmacophore description and corresponding figures 
in Section 2 were generated automatically by PROGOL. 

Despite the useful properties just outlined, ILP systems — like other machine 
learning systems — have a number of shortcomings as collaborators with humans 
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in knowledge discovery. One shortcoming is that most ILP systems return a sin- 
gle theory based on heuristics, thus casting away many clauses that might be 
interesting to a domain expert. But the only currently existing alternative is 
the version space approach, which has unpalatable properties that include in- 
efficiency, poor noise tolerance, and a propensity to overwhelm users with too 
large a space of possible hypotheses. Second, ILP systems cannot respond to a 
human expert’s questions in the way a human collaborator would. They operate 
in simple batch mode, taking a data set as input, and returning a hypothesis 
on a take-it-or-leave-it basis. Third, ILP systems do not question the input data 
in the way a human collaborator would, spotting surprising (and hence possibly 
erroneous) data points and raising questions about them. Some ILP systems will 
flag mutually inconsistent data points but to my knowledge none goes beyond 
this. Fourth, while a human expert can provide knowledge-rich forms of hypoth- 
esis justification, for example relating a new hypothesis to existing beliefs, ILP 
systems merely provide accuracy estimates as the sole justification. 

To build upon ILP’s strengths as a technology for human-computer collabo- 
ration in knowledge discovery, the above shortcomings should be addressed. ILP 
systems should be extended to display the following capabilities. 

1. maintain and summarize alternative hypotheses that explain or describe 
the data, rather than providing a single answer based on a general-purpose 
heuristic; 

2. propose to human experts practical sequences of experiments to refine or 
distinguish between competing hypotheses; 

3. provide non- numerical justification for hypotheses, such as relating them 
to prior beliefs or illustrative examples (in addition to providing numerical 
accuracy estimates); 

4. answer an expert’s questions regarding hypotheses; 

5. consult the expert regarding anomalies or surprises in the data. 

Addressing such human-computer interface issues obviously requires a variety of 
logical and AI expertise. Thus contributions from other areas of computational 
logic, such as the study of logical agents, will be vital. While several projects 
have recently begun that investigate some of these issues^ developing collab- 
orative systems is an ambitious goal with more than enough room for many 
more researchers. And undoubtedly other issues not mentioned here will become 
apparent as this work progresses. 

3.5 Parallel Execution 

While ILP has numerous advantages over other types of machine learning, in- 
cluding advantages mentioned at the start of the previous section, it has two 

6 Stephen Muggleton has a British EPSRC project on closed-loop learning, in which 
the human is omitted entirely. While this seems the reverse of a collaborative system, 
it raises similar issues, such as maintaining competing hypotheses and automatically 
proposing experiments. I am beginning a U.S. National Science Foundation project 
on collaborative systems with (not surprisingly) exactly the goals above. 
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particularly notable disadvantages — run time and space requirements. Fortu- 
nately for ILP, at the same time that larger applications are highlighting these 
disadvantages, parallel processing “on the cheap” is becoming widespread. Most 
notable is the widespread use of “Beowulf clusters” (Q and of “Condor pools” 
El > arrangements that connect tens, hundreds, or even thousands of personal 
computers or workstations to permit parallel processing. Admittedly, parallel 
processing cannot change the order of the time or space complexity of an algo- 
rithm. But most ILP systems already use broad constraints, such as maximum 
clause size, to hold down exponential terms. Rather, the need is to beat back 
the large constants brought in by large real-world applications. 

Yu Wang and David Skillicorn recently developed a parallel implementation 
of PROGOL under the Bulk Synchronous Parallel (BSP) model and claim su- 
perlinear speedup from this implementation m Alan Wild worked with me at 
the University of Louisville to re-implement on a Beowulf cluster a top-down 
ILP search for pharmacophore discovery, and the result was a linear speedup 
m- The remainder of this section described how large-scale parallelism can be 
achieved very simply in a top-down complete search ILP algorithm. This was the 
approach taken in m From this discussion, one can imagine more interesting 
approaches for other types of top-down searches such as greedy search. 

The ideal in parallel processing is a decrease in processing time that is a lin- 
ear function, with a slope near 1, of the number of processors used. (In some rare 
cases it is possible to achieve superlinear speed-up.) The barriers to achieving 
the ideal are (1) overhead in communication between processes and (2) compe- 
tition for resources between processes. Therefore, a good parallel scheme is one 
where the processes are relatively independent of one another and hence require 
little communication or resource sharing. The key observation in the design of 
the parallel ILP scheme is that two competing hypotheses can be tested against 
the data completely independently of one another. Therefore the approach ad- 
vocated here is to distribute the hypothesis space among different processors 
for testing against the data. These processors need not communicate with one 
another during testing, and they need not write to a shared memory space. 

In more detail, for complete search a parallel ILP scheme can employ a 
master-worker design, where the master assigns different segments of the hy- 
pothesis space to workers that then test hypotheses against the data. Workers 
communicate back to the master all hypotheses achieving a pre-selected mini- 
mum valuation score (e.g. 95 % accuracy) on the data. As workers become free, 
the master continues to assign new segments of the space until the entire space 
has been explored. The only architectural requirements for this approach are (1) 
a mechanism for communication between the master and each worker and (2) 
read access for each worker to the data. Because data do not change during a 
run, this scheme can easily operate under either a shared memory or message 
passing architecture; in the latter, we incur a one-time overhead cost of initially 
communicating the data to each worker. The only remaining overhead, on either 
architecture, consists of the time spent by the master and time for master-worker 
communication. In “needle in a haystack” domains, which are the motivation 
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for complete search, one expects very few hypotheses to be communicated from 
workers to the master, so overhead for the communication of results will be low. 
If it also is possible for the master to rapidly segment the hypothesis space in 
such a way that the segments can be communicated to the workers succinctly, 
then overall overhead will be low and the ideal of linear speed-up can be realized. 

4 Conclusions 

ILP has attracted great interest within the machine learning and AI communi- 
ties at large because of its logical foundations, its ability to utilize background 
knowledge and structured data representations, and its comprehensible results. 
But most of all, the interest has come from ILP’s application successes. Nev- 
ertheless, ILP needs further advances to maintain this record of success, and 
these advances require further contributions from other areas of computational 
logic. System builders and parallel implementation experts are needed if the ILP 
systems of the next decade are to scale up to the next generation of data sets, 
such as those being produced by Affymetrix’s (TM) gene expression microar- 
rays and Celera’s (TM) shotgun approach to DNA sequencing. Researchers on 
probability and logic are required if ILP is to avoid being supplanted by the 
next generation of extended Bayes Net learning systems. Experts on constraint 
satisfaction and constraint logic programming have the skills necessary to bring 
successful stochastic search techniques to ILP and to allow ILP techniques to 
extend to multimedia databases. The success of ILP in the next decade (notice I 
avoided the strong temptation to say “next millennium” ) depends on the kinds 
of interactions being fostered at Computational Logic 2000. 
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Abstract. A learning algorithm for the class of range restricted Horn 
expressions is presented and proved correct. The algorithm works within 
the framework of learning from entailment , where the goal is to exactly 
identify some pre-fixed and unknown expression by making questions to 
membership and equivalence oracles. This class has been shown to be 
learnable in previous work. The main contribution of this paper is in 
presenting a more direct algorithm for the problem which yields an im- 
provement in terms of the number of queries made to the oracles. The 
algorithm is also adapted to the class of Horn expressions with inequal- 
ities on all syntactically distinct terms where a significant improvement 
in the number of queries is obtained. 

1 Introduction 

This paper considers the problem of learning an unknown first order expressioif] 
T from examples of clauses that T entails or does not entail. This type of learning 
framework is known as learning from entailment. FF931 formalised learning from 
entailment using equivalence queries and membership queries and showed the 
learnability of propositional Horn expressions. Generalising this result to the first 
order setting is of clear interest. Learning first order Horn expressions has become 
a fundamental problem in Inductive Logic Programming mm- Theoretical 
results have shown that learning from examples only is feasible for very restricted 
classes [K loh95a| and that, in fact, learnability becomes intractable when slightly 
more general classes are considered I DM- To tackle this problem, learners 
have been equipped with the ability to ask questions. It is the case that with 
this ability larger classes can be learned. In this paper, the questions that the 
learner is allowed to ask are membership and equivalence queries. While our work 
is purely theoretical, there are systems that are able to learn using equivalence 
and membership queries (MIS [ShaS.Sj . CLINT [RB92j . for example). These ideas 
have also been used in systems that learn from examples only [RiraOOl) . 

* This work was partly supported by EPSRC Grant GR/M21409. 

1 The unknown expression to be identified is commonly referred to as target expression. 



J. Cussens and A. Frisch (Eds.): ILP2000, LNAI 1866, pp. 21 I.TGI 2000. 
( c ) Springer- Verlag Berlin Heidelberg 2000 



22 



Marta Arias and Roni Khardon 



A learning algorithm for the class of range restricted Horn expressions is 
presented. The main property of this class is that all the terms in the conclusion 
of a clause appear in the antecedent of the clause, possibly as subterms of more 
complex terms. This work is based on previous results on learnability of function 
free Horn expressions and range restricted Horn expressions. The problem of 
learning range restricted Horn expressions was solved in IKha99bl by reducing 
it to the problem of learning function free Horn expressions , solved in IKha99al . 
The algorithm presented here has been obtained by retracing this reduction 
and using the resulting algorithm as a starting point. However, it has been 
significantly modified and improved. The algorithm in fKha99al fKha99b| uses 
two main procedures. The first, given a counterexample clause, minimises the 
clause while maintaining it as a counterexample. The minimisation procedure 
used here is stronger, resulting in a clause which includes a syntactic variant 
of a target clause as a subset. The second procedure combines two examples 
producing a new clause that may be a better approximation for the target. 
While the algorithm in [ Kha,99al. Khaf)9b | uses direct products of models we use 
an operation based on the Igg (least general generalisation EMU). The use 
of Igg seems a more natural and intuitive technique to use for learning from 
entailment, and it has been used before, both in theoretical and applied work 

lEmwmKmmm- 

We extend our results to the class of fully inequated range restricted Horn 
expressions. The main property of this class is that it does not allow unification of 
its terms. To avoid unification, every clause in this class includes in its antecedent 
a series of inequalities between all its terms. With a minor modification to the 
learning algorithm, we are able to show learnability of the class of fully inequated 
range restricted Horn expressions. The more restricted nature of this class allows 
for better bounds to be derived. 

The rest of the paper is organised as follows. Section Ogives some preliminary 
definitions. The learning algorithm is presented in Section0and proved correct in 
Section 0] The results are extended to the class of fully inequated range restricted 
Horn expressions in Section 0 Finally, Section Q compares the results obtained 
in this paper with previous results and includes some concluding remarks. 



2 Preliminaries 

We consider a subset of the class of universally quantified expressions in first 
order logic. Definitions of first order languages can be found in standard texts, 
e.g. iLEEZj. We assume familiarity with notions such as term , atom , literal , 
Horn clause in the part of syntax and interpretation , truth value, satisfiability 
and logical implication in the part of semantics. 

A Range Restricted Horn clause is a definite Horn clause in which every term 
appearing in its consequent also appears in its antecedent, possibly as a subterm 
of another term. A Range Restricted Horn Expression is a conjunction of Range 
Restricted Horn clauses. 



A New Algorithm for Learning Range Restricted Horn Expressions 



23 



A multi-clause is a pair of the form [s, c], where both s and c are sets of literals 
such that sflc = 0; s is the antecedent of the multi-clause and c is the consequent. 
Both are interpreted as the conjunction of the literals they contain. Therefore, 
the multi-clause [s, c] is interpreted as the logical expression f\ bec s — * b. An 
ordinary clause C = s c —> b c corresponds to the multi-clause [s c , {6 C }]. 

We say that a logical expression T implies a multi-clause [s, c] if it implies 
all of its single clause components. That is, T |= [s, c] iff T |= /\ 6gc s — > b. 

A multi-clause [s, c] is correct w.r.t an expression T iff T \= [s,c]. A multi- 
clause [s, c] is exhaustive w.r.t T if every literal b £ s such that T |= s — > b is 
included in c. A multi-clause is full w.r.t T if it is correct and exhaustive w.r.t. 
T. 

The size of a term is the number of occurrences of variables plus twice the 
number of occurrences of function symbols (including constants) . The size of an 
atom is the sum of the sizes of the (top-level) terms it contains plus 1. Finally, 
the size of a multi-clause [s, c] is the sum of sizes of atoms in s. 

Let si,S2 be any two sets of literals. We say that s i subsumes S 2 (denoted 
Si A s 2 ) if and only if there exists a substitution 9 such that si • 9 C S 2 - We also 
say that si is a generalisation of S 2 - 

Let s be any set of literals. Then ineq(s) is the set of all inequalities between 
terms appearing in s. As an example, let s be the set {p(x,y),q(f(y))} with 
terms {x,y,f(y)}. Then ineq(s) = {x / y,x ^ f(y),y ^ f(y)} also written as 
(x ± y ^ f{y)) for short. 



Least General Generalisation. The algorithm proposed uses the least general 
generalisation or Igg operation jPlo70j . This operation computes a generalisation 
of two sets of literals. It works as follows. 

The Igg of two terms /(si, ..., s n ) and g{t \, ..., t m ) is defined as the term 
f(lgg(si,ti),...,lgg(s n ,t n )) if f = g and n = m. Otherwise, it is a new variable 
x , where x stands for the Igg of that pair of terms throughout the computa- 
tion of the Igg of the set of literals. This information is kept in what we call 
the Igg table. The Igg of two compatible atoms p(si, ..., s n ) and p(ti,...,t n ) is 
p(lgg(si, ti), ..., lgg(s n , t n )). The Igg is only defined for compatible atoms, that 
is, atoms with the same predicate symbol and arity. The Igg of two compatible 
positive literals l± and I 2 is the Igg of the underlying atoms. The Igg of two 
compatible negative literals l\ and I 2 is the negation of the Igg of the underlying 
atoms. Two literals are compatible if they share predicate symbol, arity and 
sign. The Igg of two sets of literals Si and S2 is the set {lgg{l\,l 2 ) \ {h,h) ar e 
two compatible literals of Si and S 2 }- 

Example 1. Let si = {p{a, f(b)),p(g(a, x), c), q(a)} and s 2 = {p(z, f(2)),q(z)} 
with lgg(si,S2) = {p(X, f(Y)),p(Z,V),q( A')}. The Igg table produced during 
the computation of lgg(s 1, S2) is 
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[a - z => X] (from p(a, f(b)) with p(z, /(2))) 

[b - 2 => Y] (from p{a, /(&)) with p(z, /( 2))) 

[f (b) - f (2) => f (Y) ] (from p(a, f (b ) ) with p(z, /( 2) )) 

[g(a,x) - z => Z] (from p(g(a, x), c) with p(z, /(2))) 

[c - f(2) => V] (from p(g(a, x), c) with p(z, /(2))) 



2.1 The Learning Model 

We consider the model of exact learning from entailment ism In this model 
examples are clauses. Let T be the target expression, H any hypothesis presented 
by the learner and C any clause. An example C is positive for a target theory T 
if T |= C , otherwise it is negative. The learning algorithm can make two types 
of queries. An Entailment Equivalence Query ( EntEQ ) returns “Yes” if H = T 
and otherwise it returns a clause C that is a counter example, i.e., T \= C and 
H C or vice versa. For an Entailment Membership Query ( EntMQ ), the 
learner presents a clause C and the oracle returns “Yes” if T \= C, and “No” 
otherwise. The aim of the learning algorithm is to exactly identify the target 
expression T by making queries to the equivalence and membership oracles. 

2.2 Transforming the Target Expression 

In this section we describe the transformation U(T ) performed on any target ex- 
pression T. This transformation is never computed by the learning algorithm; it 
is only used in the analysis of the proof of correctness. Related work in 
also uses inequalities in clauses, although the learning algorithm and approach 
are completely different. 

The idea is to create from every clause C in T the set of clauses U ( C ) . Every 
clause in U ( C ) corresponds to the original clause C with its terms unified in a 
unique way, different from every other clause in 17(C). All possible unifications 
of terms of C are covered by one of the clauses in 17(C). The clauses in 17(C) 
will only be satisfied if the terms are unified in exactly that way. To achieve this, 
a series of appropriate inequalities are prepended to every transformed clause’s 
antecedent. This process is described in Figured It uses the most general unifier 
operation or mgu. Details about the mgu can be found in ;Ho871| . 

We construct U(T) from T by considering every clause separately. For a 
clause C in T with set of terms T, we generate a set of clauses /7(C). To do 
that, consider all partitions of the terms in T ; each such partition, say 7r, can 
generate a clause of /7(C), denoted U n (C). Therefore, U(T) = f \ CeT U(C ) and 
U(C) = A,r ePartitions(T) *Ar (C) . The clause Uk{C) is computed as follows. Tak- 
ing one class at a time, compute its mgu if possible. If there is no mgu, discard 
that partition. Otherwise, apply the unifying substitution to the rest of elements 
in classes not handled yet, and continue with the following class. If the repre- 
sentative^ of any two distinct classes happen to be equal, then discard that 
partition as well. This is because the inequality between the representatives of 

We call the representative of a class any of its elements applied to the mgu. 
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1. Set U(T) to be the empty expression (T is the expression to be transformed). 

2. For every clause C = s c — > b c in T and for every partition n of the set of terms 
(and subterms) appearing in C do 

— Let the partition n be ..., n i}. Set oo to 0. 

— For i = 1 to l do 

• If Xi ■ Oi-i is unifiable, then 6i = mgu(iVi ■ Oi- 1 ) and m = Oi-i ■ Qi. 

• Otherwise, discard the partition. 

— If there are two classes m and Wj (i j) such that ni ■ oi = irj ■ oi , then 
discard the partition. 

— Otherwise, set U W (C) = ineq(s c ■ cn), s c ■ cn — > b c ■ <ji and U(T) = U(T) A 
U„{C). 

3. Return U(T). 



Fig. 1. The transformation algorithm 



those two classes will never be satisfied (they are equal!), and the resulting clause 
is superfluous. When all classes have been unified, we proceed to translate the 
clause C. All (top-level) terms appearing in C are substituted by the mgu found 
for the class they appear in, and the inequalities are included in the antecedent. 
This gives the transformed clause U n (C). 

Example 2. Let the clause to be transformed be C = p(f(x),f(y),g(z )) — > 
q(x,y,z). The terms appearing in C are {x,y, z, f(x), f(y), g(z)}. We consider 
some possible partitions: 

- When 7 r = {x,y},{z},{f(x),f(y)},{g(z)}. 
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C-ai^ p(f(x), f(x),g(z)) -> q(x, x, z) and 

U„(C) = (x±z± /( x) ± g(z)),p(f(x),f(x),g(z)) -> q(x,x,z). 

- When 7T = {x,y,z},{f(x),g(z)},{f(y)}. 



Stage 


mgu 


e 


a 


Partitions Left. 


0 






0 




{x,y,z},{f(x),g(z)},{f(y)} 


1 


{x,y,z} 


{y,z>-^ x} 


{y,z^ 


x } 


{f(x),g(x)},{f(x)} 


2 


{f(x),g(x)} 


No mgu 






PARTITION DISCARDED 



If the target expression T has m clauses, then the number of clauses in the 
transformation U(T) is bounded by mt * , with t being the maximum number of 
distinct terms appearing in one clause of T (the number of partitions of a set 
with t elements is bounded by f 4 ). Notice however that there are many partitions 
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that will be discarded by the process, for example all those partitions containing 
some class with two functional terms with different top-level function symbol, 
or partitions containing some class with two terms such that one is a subterm 
of the other. Therefore, the number of clauses in the transformation will be in 
practice much smaller than mi 1 . 

The transformation U (T) of a range restricted expression T is also range 
restricted. It can be also proved that T |= U(T ), since every clause in U(T) is 
subsumed by some clause in T. As a consequence, U(T) |= C implies T \= C 
and hence U(T) |= [s, c] implies T \= [s,c]. 



3 The Algorithm 

The algorithm keeps a sequence S of representative counterexamples. The hy- 
pothesis H is generated from this sequence, and the main task of the algorithm 
is to refine the counterexamples in S in order to get a more accurate hypothesis 
in each iteration of the main loop, line [21 until hypothesis and target expression 
coincide. 

There are two basic operations on counterexamples that need to be explained 
in detail. These are minimisation (line 12 bH . that takes a counterexample as 
given by the equivalence oracle and produces a positive, full counterexample; 
and pairing (line 12 d) . that takes two counterexamples and generates a series of 
candidate counterexamples. The counterexamples obtained by combination of 
previous ones (by pairing them) are the candidates to refine the sequence S. 
These operations are carefully explained in the following sections 13 . 1 1 and 12.21 

The algorithm uses the procedure rhs. The first version of rhs has 1 in- 
put parameter only. Given a set of literals s, rhs(s) computes the set of all 
literals not in s implied by s w.r.t. the target expression. That is, rhs(s) = 
{b qL s | EntMQ(s — > b) = Yes} . 

The second version of rhs has 2 input parameters. Given the sets s and c, 
rhs(s , c) outputs those literals in c that are implied by s w.r.t. the target ex- 
pression. That is, rhs(s , c) = {b S c | b £ s and EntMQ(s — > b) = Yes} . Notice 
that in both cases the literals b considered are literals containing terms that 
appear in s only. Both versions ask membership queries to find out which of the 
possible consequents are correct. The resulting sets are in both cases finite since 
the target expression is range restricted. 

3.1 Minimising the Counterexample 

The minimisation procedure has to transform a counterexample clause x as 
generated by the equivalence query oracle into a multi-clause counterexample 
[s x , Cx] ready to be handled by the learning algorithm. This is done by removing 
literals and generalising terms. 

The minimisation procedure constructs first a full multi-clause that will be 
refined in the following steps. To do this, all literals implied by antecedent(x) 
and the clauses in the hypothesis will be included in the first version of the new 
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1. Set S to be the empty sequence and H to be the empty hypothesis. 

2. Repeat until EntEQ(H) returns “Yes”: 

(a) Let x be the (positive) counterexample received (T \= x and H \/= x). 

(b) Minimise counterexample x - use calls to EntMQ. 

Let [s x ,c x \ be the minimised counterexample produced. 

(c) Find the first [si, d\ € S such that there is a basic pairing [s, c] of terms of 
[Si,a] and [s^c*] satisfying: 

i. size(s) < size(si) 

ii. rhs(s,c)=£$ 

(d) If such an [si,Ci] is found then replace it by the multi-clause [s, rhs(s, c)]. 

(e) Otherwise, append [s x ,c x \ to S. 

(f) Set H to be /\{ 3 ,c]es {s -> b \ b € c}. 

3. Return H 



Fig. 2. The learning algorithm 



1. Let x be the counterexample obtained by the EntEQ oracle. 

2. Let s x be the set of literals {b\ H \= antecedent(x ) — > &} and set c x to rhs(s x ). 

3. Repeat until no more changes are made 

— For every functional term t appearing in s x , in decreasing order of size, do 

• Let [s!c, c'x] be the multi-clause obtained from [s x , c x \ after substituting 
all occurrences of the term f(t) by a new variable Xf(t)- 

• If rhs(s'x,c'x) 0, then set [s^c*] to [s x ,rhs(s x ,c x )]. 

4. Repeat until no more changes are made 

— For every term t appearing in s x , in increasing order of size, do 

• Let [ s' x ,c: x \ be the multi-clause obtained after removing from [s x ,c x ] 
all those literals containing t. 

• If rhs(s x , c x ) ^ 0, then set [st.c,] to [s x ,rhs(s x ,c x )]. 

5. Return [ s x ,c x \ . 



Fig. 3. The minimisation procedure 



counterexample’s antecedent: s x (line EJ. This can be done by forward chain- 
ing using the hypothesis’ clauses, starting with the literals in antecedent(x) . 
Finally, the consequent of the first version of the new counterexample ( c x ) will 
be constructed as rhs(s x ). 

Next, we enter the loop in which terms are generalised (line EJ. We do this 
by considering every term that is not a variable (constants are also included), 
one at a time. The way to proceed is to substitute every occurrence of the term 
by a new variable, and then check whether the multi-clause is still positive. If so, 
the counterexample is updated to the new multi-clause obtained. The process 
finishes when there are no terms to be generalised in [s x ,c x \. Note that if some 
term cannot be generalised, it will stay so during the computation of this loop, so 
that by keeping track of the failures, unnecessary computation time and queries 
can be saved. 
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Finally, we enter the loop in which literals are removed (lineEJ. We do this by 
considering one term at a time. We remove every literal containing that term in 
s x and c x and check if the multi-clause is still positive. If so, the counterexample 
is updated to the new multi-clause obtained. The process finishes when there 
are no terms to be dropped in [s x ,c x ]. 



Example 3. Parentheses are omitted and the function / is unary. Let T be the 
single clause p(fx) —> q(x). We start with counterexample [p(fa),q{b) — > g(a)] 
as obtained after step El of the minimisation procedure. 
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3.2 Pairings 

A crucial process in the algorithm is how two counterexamples are combined 
into a new one, hopefully yielding a better approximation of some target clause. 
The operation proposed here uses pairings of clauses, based on the Igg. 

We have two multi-clauses, [s x ,c x ] and [sj,C;] that need to be combined. To 
do so, we generate a series of matchings between the terms of s x and s,, and any 
of these matchings will produce the candidate to refine the sequence S. 



Matchings. A matching is a set whose elements are pairs of terms t x — ti , where 
t x G s x and ti G Si. If s x contains less terms than Si, then there should be an 
entry in the matching for every term in s x . Otherwise, there should be an entry 
for every term in .s,. That is, the number of entries in the matching equals the 
minimum of the number of terms in s x and Sj. We only use 1-1 matchings, i.e., 
once a term has been included in the matching it cannot appear in any other 
entry of the matching. Usually, we denote a matching by the Greek letter er. 

Example 4- Let [s x ,c x ] be [{p(a, 6)}, {g(a)}] with terms {a, b}. Let [sj,Cj] be 
[{p(/(l), 2)}, {g(/(l))}] with terms }1,2,/(1)}. The possible matchings are: 

ay = {a- 1,6-2} er 3 = {a - 2,6- 1} cr 5 = {a - /(l), 6 - 1} 

= {a - 1, 6 - /( 1)} cr 4 = {a - 2, 6 - /(1)> cr 6 = {a - /( 1), 6-2} 

An extended matching is an ordinary matching with an extra column added 
to every entry of the matching. This extra column contains the Igg of every pair 
in the matching. The Iggs are simultaneous, that is, they share the same table. 
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An extended matching cr is legal if every subterm of some term appearing 
as the Igg of some entry, also appears as the Igg of some other entry of a. An 
ordinary matching is legal if its extension is. 

Example 5. Let a ± be {a - c, f(a) - 6, /(/(a)) - f(b),g(f(f(a))) - s(/(/(c)))} 
and cr 2 = {a — c,f(a) — 6, /(/(a)) — /(&)}. The matching o\ is not legal, since 
the term f(X) is not present in its extension column and it is a subterm of 
g{f{f(X))), which is present. The matching a 2 is legal. 

Extended a± Extended a 2 

[a - c => X] [a - c => X] 

[f (a) - b => Y] [f (a) - b => Y] 

[f (f (a) ) - f(b) => f (Y) ] [f (f (a) ) - f(b) => f (Y) ] 

[g(f (f (a) ) ) - g(f(f(c))) => g(f (f (X) ) )] 

Our algorithm considers yet a more restricted type of matching. A basic 
matching a is defined for two multi-clauses [ s x ,c x ] and [sj,Cj] such that the 
number of terms in s x is less than or equal to the number of terms in Sj. It 
is a 1-1, legal matching such that if entry f(ti,...,t n ) — g(r\, ..., r m ) £ cr, then 
f = g, n = to and fj — r* £ cr for all i = 1, ..., n. Notice this is not a symmetric 
operation, since [s x ,c x ] is required to have less distinct terms than [si,Ci\. 

To construct basic matchings given [s x ,c x ] and [sj,Cj], consider all possible 
matchings between the variables in s x and the terms in Sj only. Complete them 
by adding the functional terms in s x that are not yet included in the basic 
matching in an upwards fashion, beginning with the more simple terms. For 
every term f(t\,...,t n ) in s x such that all f, — ri (with i = 1 . .... n) appear 
already in the basic matching, add a new entry f(t\, ..., t n ) — f(r±, ..., r n ). Notice 
this is not possible if f(r±, ...,r n ) does not appear in Si or the term /(n, ... ,r n ) 
has already been used. In this case, we cannot complete the matching and it is 
discarded. Otherwise, we continue until all terms in s x appear in the matching. 
By construction, constants in s x must be matched to the same constants in Sj. 

Example 6. Let s x be {p(a, fx)} containing the terms {a,x,fx}. Let Si be 
{p{a, fl),p(a, 2)} containing terms {a, 1,2, /l}. No parentheses for functions 
are written. The basic matchings to consider are the following. 

- Starting with [x - a] : cannot add [a - a] , therefore discarded. 

- Starting with [x - 1] : can be completed with [a - a] and [fx - fl], 

- Starting with [x - 2]: cannot add [fx - f2], therefore discarded. 

- Starting with [x - fl] : cannot add [fx - ffl] , therefore discarded. 

One of the key points of our algorithm lies in reducing the number of match- 
ings needed to be checked by ruling out some of the candidate matchings that 
do not satisfy some restrictions imposed. By doing so we avoid testing too many 
pairings and hence avoid making unnecessary calls to the oracles. One of the 
restrictions has already been mentioned, it consists in considering basic pairings 




